-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate string replace.pxd to pylibcudf #15839
Migrate string replace.pxd to pylibcudf #15839
Conversation
set(linked_libraries cudf::cudf) | ||
rapids_cython_create_modules( | ||
CXX | ||
SOURCE_FILES "${cython_sources}" | ||
LINKED_LIBRARIES "${linked_libraries}" MODULE_PREFIX pylibcudf_ ASSOCIATED_TARGETS cudf | ||
LINKED_LIBRARIES "${linked_libraries}" MODULE_PREFIX pylibcudf_strings ASSOCIATED_TARGETS cudf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed since the string replace.pyx clashes with the regular replace.pyx
)) | ||
else: | ||
# Column case | ||
# TODO: maxrepl should be supported in the corresponding CUDA/C++ code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the overload of replace in libcudf where input/target/repl are columns, there isn't a maxrepl arg.
We should probably support this in libcudf replace (eventually), otherwise we'll have some weirdness in pylibcudf where we'll have to raise for maxrepl despite accepting it as an argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Can you raise an issue?
In the meantime, I would recommend that we change the default value of the parameter to None, then raise a NotImplementedError in this branch of the code if we find a non-None value, while in the Scalar branch we set it to -1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably should go in after #15503 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some very minor nits to pick, but overall looks great!
)) | ||
else: | ||
# Column case | ||
# TODO: maxrepl should be supported in the corresponding CUDA/C++ code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Can you raise an issue?
In the meantime, I would recommend that we change the default value of the parameter to None, then raise a NotImplementedError in this branch of the code if we find a non-None value, while in the Scalar branch we set it to -1.
@pytest.fixture(scope="module", params=["a", "c", "A", "Á", "aa", "ÁÁÁ"]) | ||
def scalar_repl_target(request): | ||
pa_target = pa.scalar(request.param, type=pa.string()) | ||
return (request.param, plc.interop.from_arrow(pa_target)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting approach. This accomplishes the same thing in spirit that the other tests have done with a plc fixture that's constructed from the arrow fixture, just in a slightly different way. The more I stare at it, I'm starting to prefer this approach a little since it ties the two objects very tightly together in a way that reflects that they'll probably always be used together in tests. I suppose it's largely a matter of taste though. @mroeschke @brandon-b-miller WDYT? I would generally prefer to be consistent across tests; it's not a huge deal, but given that our test suite is still fairly small I'd be fine with a quick PR to standardize on this approach if we like it better (or change this to the separate fixture approach if we prefer that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what we do in pandas.
The advantage of this approach is being able to pair the values together.
(which is good since you'll never have cases where the fixtures get out of sync)
The disadvantage is that you have to add a line unpacking the tuple in every test that uses the fixture.
I'm happy with either, just did it this way since it was more familiar for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think I prefer this approach. After this PR merges, could you make a follow-up that standardizes this in other tests? Basically just removing pairs of (pyarrow,pylibcudf) fixtures in favor of a single fixture returning the pair? You can also pair up the column fixtures in this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
I'm holding off on a second review until #15503 is merged, but in the meantime could you add a short PR description please? |
Lets wait for #15898 |
#15898 is merged |
OK, this is re-sync'ed and ready for re-review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of small suggestions and a request to propagate the change for testing fixtures forward. Otherwise LGTM!
Co-authored-by: Vyas Ramasubramani <[email protected]>
/merge |
Condense all pa_foo/plc_foo data fixtures into just foo, as recommended by #15839 (comment). Authors: - Thomas Li (https://github.com/lithomas1) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #15958
Description
xref #15162
Change replace.pxd to use pylibcudf APIs.
Checklist